Generating multiple-accent pronunciations for TTS using joint sequence model interpolation
نویسندگان
چکیده
Standard grapheme-to-phoneme (G2P) systems are trained using a homogeneous lexicon, for example one associated with a particular accent. In practice, a synthesis system may be required to handle multiple accents. Furthermore, a speaker rarely has a pure accent; accents vary continuously within and between regions of a country. Generating phonetic sequences for each accent is possible, but combining them to yield a single synthesis pronunciation is highly challenging. To address this problem, this paper considers a space of accents. The bases for these spaces are defined by statistical G2P models in the form of graphone models. A linear combination of these models define the accent space. By selecting a point in this continuous space, it is possible to specify the accent for an individual speaker. The performance of this approach is evaluated using an accent space defined by American, Scottish and British English. By moving around the accent space, it is shown that it is possible to synthesize speech from all these accents as well as a range of intermediate points.
منابع مشابه
Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks
Word pronunciations, consisting of phoneme sequences and the associated syllabification and stress patterns, are vital for both speech recognition and text-to-speech (TTS) systems. For speech recognition phoneme sequences for words may be learned from audio data. We train recurrent neural network (RNN) based models to predict the syllabification and stress pattern for such pronunciations making...
متن کاملIntonation modeling for TTS using a joint extraction and prediction approach
This paper presents a joint extraction and prediction framework for intonation modeling. The intonation model is based on a superpositional approach using Bézier curves. The components are attached to minor phrase and accent group. A greedy algorithm performs succesive partitions on training data using linguistic information. The parameters related to each partition are obtained using a global ...
متن کاملAutomatic rule generation for linguistic features analysis using inductive learning technique: linguistic features analysis in TOS drive TTS system
The linguistic features analysis for input text plays an important role in achieving natural prosodic control in text-to-speech (TTS) systems. In a conventional scheme, experts refine suspicious if-then rules and change the tree structure manually to obtain correct analysis results when input texts that have been analyzed incorrectly. However, altering the tree structure drastically is difficul...
متن کاملData-driven phonetic comparison and conversion between south african, british and american English pronunciations
We analyse pronunciations in American, British and South African English pronunciation dictionaries. Three analyses are perfomed. First the accuracy is determined with which decision tree based grapheme-to-phoneme (G2P) conversion can be applied to each accent. It is found that there is little difference between the accents in this regard. Secondly, pronunciations are compared by performing pai...
متن کاملComparing direct G2P with G2P followed by accent conversion when determining pronunciations for South African English
It has been shown that techniques known as grapheme-and-phoneme-to-phoneme (GP2P) conversion can be used to derive pronunciations in a poorly-resourced accent, such as South African English, using available pronunciations in better-resourced accents of the same language, such as British and American English. However if the pronunciation is not available in either accent, it must be obtained usi...
متن کامل